Dynamics of the Fisher Information Metric 



Xavier Calmet 1 ^ and Jacques Calmet 2 * 

1 University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA 
*Institute for Algorithms and Cognitive Systems (IAKS) 
University of Karlsruhe (TH), D-76131 Karlsruhe, Germany 



Abstract 

We present a method to generate probability distributions that correspond to met- 
rics obeying partial differential equations generated by extremizing a functional 
J[g^ u (#*)], where g^iO 1 ) is the Fisher metric. We postulate that this functional 
of the dynamical variable g^ u (6 l ) is stationary with respect to small variations of 
these variables. Our approach enables a dynamical approach to Fisher informa- 
tion metric. It allows to impose symmetries on a statistical system in a systematic 
way. This work is mainly motivated by the entropy approach to nonmonotonic 
reasoning. 
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1 Introduction 



Nonmonotonic reasoning is basically a static process. The question whether it is possible 
to introduce some sort of dynamics in a reasoning process is open. A motivation would 
be that nonmonotonic reasoning leads to solutions of classification and degree of belief 
problems. Dynamics would allow to consider or introduce symmetries into these games 
in a very systematic way. A question is thus to assess whether we can define dynamical 
reasoning processes and how. The seminal paper [S] presented a maximum entropy 
approach to nonmonotonic reasoning. In ^H] Stachniak points out the link between 
universal algebras and AI with a special emphasis to nonmonotonic reasoning. There 
is a close connection between satisfiability and preferential matrix representation ideas. 
The matrix representation arises from a set of equations that are weighted. An open 
problem is to give a meaning to these weights. Furthermore, this approach is in fact 
closely linked to the definition of the problem in terms of many valued logics when 
reasoning is seen as an inference process. This very simple survey leads to a first track 
i.e. entropy. Indeed Fisher information metric jH] is an expression for entropy and 
it is expressed as a matricial equation that can be linked to the semantic defined by 
Stachniak. Then, we can use tools from Physics to introduce dynamical features into 
the Fisher metric formalism. Although these tools are well-known from any physicists, 
it looks like such a study was never completed. We have mentioned only nonmonotonic 
reasoning since it is closer to the research interests of one of the authors, but we could 
have quoted quantum computing where dynamical features of entropy ought to bring new 
ideas. In this paper we concentrate on the theoretical problem to introduce dynamics 
into the Fisher formalism. 

The Fisher information metric can be calculated once a probability distribution has 
been chosen. In this work we wish to present a method to generate probability distribu- 
tions that correspond to metrics obeying partial differential equations (Euler-Lagrange 
PDEs) generated by extremizing a functional J[g^ y (9 i y\, where g^iO 1 ) is the Fisher met- 
ric. We will postulate that this functional of the dynamical variable g^ u {9 1 ) is stationary 
with respect to small variations of these variables. Our approach enables a dynamical 
approach to Fisher information metric. It allows to impose symmetries on a statistical 
system in a systematic way. We will show how to obtain partial differential equations 
for the probability distributions. Imposing that the functional remains invariant un- 
der certain transformations of the coordinates 9 l will constrain the class of probability 
distributions. These symmetries can also be broken in a consistent way using methods 
borrowed from theoretical particle physics and general relativity. As an example we will 
require that the functional is covariant under general transformations of the coordinates 
9. This will lead to non-linear partial differential equations corresponding to Einstein's 
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equations [5] of general relativity (see e.g. ^H] for a nice introduction to general rel- 
ativity). The requirement of that the functional remains invariant under infinitesimal 
general coordinate transformations 9 % — > 9 l leads to Bianchi identities. 

A set of distributions pg = p{9) parametrized by 9 l with i 6 {l...n} is a manifold. 
The Riemanian metric on this manifold is Fisher information metric defined by: 

g»u = / dxpe(x) - , . — - , . — . (1) 
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In the sequel we shall first briefly introduce the concept of Fisher information metric in- 
sisting on the connection to the concept of Shannon entropy and of the Kullback-Leibler 
distance. We shall then introduce the dynamics in the Fisher information metric. We 
shall argue that requiring the covariance of the functional J[g^ v {9 1 )} under general trans- 
formations of 9 % is natural since the Fisher information metric is itself invariant under 
reparametrization of the manifold [I] . We note that the Fisher information metric is also 
invariant under transformations of the random variable We will concentrate on 

general coordinate invariance, but in principle depending on the application, any type 
of symmetry could be imposed on the functional. 



2 Review of Fisher information metric 

There are many excellent reviews and books on Fisher information metric, a nice in- 
troduction can be found in [Jj. Fisher information metric has been applied in different 
fields. This concept appears in such different fields as e.g. instanton calculus [T7], ontol- 
ogy 0; models for short distance modifications of space-time [2] or econometrics 
Symbolic computations of Fisher information matrices have also been considered |12j . 
We shall give a brief derivation of the information metric. A distance d(Pi, P 2 ) between 
two points Pi and P2 has to satisfy the following three axioms: 

1. Positive definiteness: VPi,P 2 : d(Pi,P 2 ) > 

2. Symmetry d(P u P 2 ) = d(P 2 , Pi) 

3. Triangle inequality: VPi, P 2 , P 3 : d(P u P 2 ) < d(P u P 2 ) + d{P x , P 3 ). 

It is often useful to introduce a concept of distance between elements of a more 
abstract set. For example, one could ask what is the distance between two distributions 
between e.g. the Gaussian and binomial distributions. It is useful to introduce the 
concept of entropy as a mean to define distances. In information theory, Shannon 
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entropy ^5] represents the information content of a message or, from the receiver point 
of view, the uncertainty about the message the sender produced prior to its reception. 
It is defined as 



^p(z)logp(i) 



(2) 



where p(i) is the probability of receiving the message i. The unit used is the bit. The 
relative entropy can be used to define a "distance" between two distributions p(i) and 
g(i). The Kullback-Leibler [10 distance or relative entropy is defined as 



where p(i) is the real distribution and g(i) is an assumed distribution. Clearly the 
Kullback-Leibler relative entropy is not a distance in the usual sense: it satisfies the 
positive definiteness axiom, but not the symmetry or the triangle inequality axioms. It 
is nevertheless useful to think of the relative entropy as a distance between distributions. 

The Kullback-Leibler distance is relevant to discrete sets. It can be generalized to 
the case of continuous sets. For our purposes, a probability distribution over some field 
(or set) X is a distribution p : X e R, such that 



We shall consider families of distributions, and parametrize them by a set of continuous 
parameters 9 l that take values in some open interval M CM. 4 . We use the notation pg 
to denote members of the family. For any fixed 6, pg : x i— > pg(x) is a mapping from X 
to R. We shall consider the extension of the family of distributions F = {pg\0 e M}, to 
a manifold A4 such that the points p € M. are in one to one correspondence with the 
distributions p G F. The parameters 9 of F can thus be used as coordinates on M. . 

The Kullback number is the generalization of the Kullback-Leibler distance for con- 
tinuous sets. It is defined as 



Let us now study the case of an infinitesimal difference between qg[x) = pg +ev (x) and 
p e {x): 




(3) 



1. f x d^x p{x) = 1 

2. For any finite subset S C X, J s dfa p(x) > 0. 




(4) 
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Expanding in e and keeping 9 and v fix one finds: 

I(p e+ ev\\pe) = I(p + e\\p)\ t=0 + el'(e)\ t=0 + ^ 2 I"(e)\ e=0 + 0(e 3 ). (6) 
One finds 1(0) = I'(0) = and 

\Jx \Pe(x) d6n J \p e (x) 89- J J 

We can now identify the Fisher information metric on a manifold of probability 
distributions as 

f ,4 / x ( 1 dp e (x)\ f 1 dp e (x)\ 

It has been show that this matrix is a metric on a manifold of probability distributions, 
see e.g. jT^J. Corcuera and Giummole [3] have shown that the Fisher information metric 
is invariant under reparametrization of the sample space X and that it is covariant under 
reparametrizations of the manifold, i.e. the parameter space, see e.g. JIB] for a review. 

3 Dynamics of Fisher information metric 

Given a distribution p(9), the Fisher information metric can be calculated using (jSJ). 
Here we wish to approach this issue from a different point of view and derive the Fisher 
information metric from a dynamical perspective. We introduce a functional J[g' iu (9' 1 )}. 
To constrain the functional dependence on g IJ ' 1/ (9 l ), we impose that this functional be 
invariant under general transformations of the coordinates 9 l . As already mentioned, 
the Fisher information metric is covariant under reparametrizations of the manifold, i.e. 
it is covariant under general coordinate transformations 9 l — > 9 % . It thus seems natural 
to posit that the functional Jlg^], describing the dynamics of the metric, is invariant 
under general coordinate transformations 9 % — > 9 i ' . We shall first introduce a few basic 
quantities which are well known from differential geometry and the general theory of 
relativity. 

3.1 Definitions 

Differential geometry is the tool we shall be using, just as in general relativity, the 
concepts of differential geometry we need are nicely introduced in jT^]. Let us define a 
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few quantities. The affine connection is denned as: 



per 1 ua ( dfh* , dgx* _ (q] 
Xv 2 y V 99 x 08» 88 v J ' { ' 

The curvature tensor is given by: 

dT x 8T X 

r>\ _ V- v _|_ yn -pA y<ri pA fl 

^ K f)^ K cM 1 ' ^ KJ ? ^ K ^ ' 

it is the only tensor that can be constructed from the metric tensor g^ v and its first and 
second derivatives. We shall also need the Ricci tensor which reads 

RfiK = R^\ K (11) 

and the curvature scalar is given by 

R = gTR^. (12) 



3.2 Dynamics 

We now have all the necessary tools to introduce the invariant functional which describes 
the dynamics of Fisher information metric. Because of 4], it seems natural to posit that 
the functional J[<7 Mi ,], describing the dynamics of the metric, is invariant under general 
coordinate transformations 8 % — ► 8 1 ' 3 . We shall start from the functional: 

J[g,A = y^J ^W)RWo (13) 

where g =detg tiu and R(8) is the curvature scalar. This functional is a scalar and it is 
invariant under general coordinate transformations. Note that R(9) is a scalar and is 
thus invariant under general coordinate transformations, d 4 8 transforms as 

d A 8 (14) 
and \/ g{9) transforms as: 

vW)- (15) 
3 this situation arises in Eintein's theory of general relativity. 
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The functional is a scalar and thus invariant under general coordinate transformations. 
The variation of J[g^ v } with respect to g^ u leads to 



5JW = iib/^ 



R^iO) - \<TR{B) 



6g^d 4 9, (16) 



where R fiu (9) is the Ricci tensor introduced in eq- ljlip and R{9) is the curvature scalar 
(see eq. (J12J) ). The requirement that J[<7 Ml /] is invariant with respect to variation of the 
metric g^ u , 8g^ v leads to the following partial differential equations (the Euler-Lagrange 
equations): 

UTtf) - l -g^{6)R{6) = 0. (17) 

Contracting eq. (fT7|) with g^, one finds R = 0, the partial differential equations (the 
Euler-Lagrange equations) are thus: 

R^ifi) = (18) 

which correspond to the equations obtained by Einstein in his theory of general relativity 
j^]. If the statistical system under consideration is invariant under reparametrization 
of the manifold, the dynamics of Fisher information metric is governed by these partial 
differential equations. 

Another interesting case is the one of infinitesimal transformations 9 = 9 + e(9). The 
functional J[g^v\ being a scalar is also invariant under this class of transformation. This 
transformation of 9 implies the following variation for the metric: 

r f0 s ( n^e\9) (m de\9) dg, v {9) x 

The variation of the functional J with respect to this transformation leads to the con- 
tracted Bianchi identity: 

[R\ - = 0; (20) 

where the semicolon stands for the covariant derivative which is defined by: 

8V 

V* =9$- *%>V. (21) 

It is possible to introduce more dynamics in the model by introducing another source 
(the partial differential equations being nonlinear, the metric is a source for itself already) 
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for the metric in the form of a symmetric tensor T^ v which transforms as a contravariant 
tensor (i.e. T^ u ' — >• ||^||^T pf:r ). The modified functional then becomes: 

J[g,A = ^J vW)R(o)d'o + \J yW)T^g^e. (22) 

This functional leads to the partial differential equations: 

R^(8) ~ \f v { d )R{ e ) + SnT^iO) = 0. (23) 

In the general theory of relativity the case T^ v {6) = correspond to empty space. 
In a statistical system case, the tensor T^ v {9) can be used to implement constraints 
on the statistical system under consideration. Notice that symmetries, in our case 
general coordinate invariance, can be broken by generating terms in T^ u that are not 
transforming as second rank tensors. Imposing symmetries on a system and breaking 
these symmetries usually leads to relations between the parameters of the model. This 
can be very useful for applications. 

Obviously the functional we have chosen is the simplest case possible. We have chosen 
to build the curvature tensor using only the metric and its first and second derivative. 
One could obviously obtain more complicated models by taking higher derivatives into 
account. One could also introduce further dynamics in the model. As an example one 
can add a scalar <f> in the theory and consider a so-called Brans-Dicke model pQ: 

J[9„u] — ttt~ [ y/g{9j</>(9)R(6)d?9 + dynamics for (f)(9). (24) 

l07T J 

This functional is also a scalar i.e. it is invariant under general coordinate transforma- 
tions, but it leads to a different set of partial differential equations. 

3.3 Constraints for the probability distributions 

If one inserts the expression for the Fisher information metric (JHJ) in the partial differ- 
ential equations ([18)1 obtained postulating that the functional is invariant under general 
coordinate transformations (note that and R are functions of g^), one finds a 
lengthy and complicated expression. It is a differential equations for the distribution 
Pe(x). It is best to first find solutions to the partial differential equations (fT7|) or (JTSj) 
to obtain the metric constrained by the symmetries of the problem and then to deduce 
the constraints for the distribution p$(x). Finding an exact solution to the partial dif- 
ferential equations (JT7j) or (j!8|) is in general very difficult, however when the problem 
under consideration has enough symmetries, certain solutions are known. 
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Obviously, a trivial diagonal metric with constant entries of the type (1, 1) is a 
solution of the partial differential equations (|17|) . One obtains the following constraints 
for probability distributions: 

x Fdy ' \p e (x) d9» J \p e (x) 09- J 

this constraint is trivial and for example Gaussian distributions fulfill it. 

Another example is that of a four dimensional problem which is spherical symmetric 
in three coordinates (isotropic) say 9 1 ,9 2 and 9 3 and with a metric that is independent 
on the fourth coordinate 9°. In that case a solution was found by Schwarzschild 
(see also e.g. [IS]). Using spherical coordinates for three coordinates 9 1 ,9 2 and 9 3 , the 
metric is of the form (1 — — , (1 — —J , r 2 , r 2 sin 2 9) for a four dimensional problem 
denoting the coordinates by (r, r, 9, (p), and where a is an integration constant. We thus 
obtain the following constraints for a probability distribution: 

x p e {x) V or J r 

d4 i / _2? 1 

x p e {x) \ Or J V r 
d i x 1 ( l dp e (x) \ 2 = r2 
x pe(x) \r 89 J 
, 1 / 1 dp e (x)\ 2 _ 2 • 2 



A' 



d a;^-^ — — 5 T I = r sin ^ 
, / 1 dp e (x)\ ( 1 dp g (x) , 



This example illustrates how to obtained non-trivial constraints on the probability dis- 
tributions. 



4 Conclusions 

In this work we have presented a method to generate probability distributions that 
correspond to metrics obeying partial differential equations generated by extremizing a 
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functional J[g fiu (9 1 )}, where g^ v {9 1 ) is the Fisher metric. We have postulated that this 
functional of the dynamical variable g^ u {9 1 ) is stationary with respect to small variations 
of these variables. Our approach enables a dynamical approach to Fisher information 
metric. It allows to impose symmetries on a statistical system in a systematic way. We 
have presented different models and some solutions to these partial differential equations. 
There is a very nice analogy between Fisher information metric and the Einstein's theory 
of general relativity. We have argued that since the Fisher information metric is covariant 
under reparametrizations of the manifold, i.e. it is covariant under general coordinate 
transformations 9 % — > 9 1 ' , it is natural to posit that the functional J[g^u\i describing 
the dynamics of the metric, is invariant under general coordinate transformations 9 % — > 
9 % . This led us to the functional that determines the dynamics of our models. As 
pointed out at the very beginning of the paper we foresee several applications domains 
such as reasoning or quantum computing. There is an additional application that is 
under investigation and is a classical one for stochastical methodologies: classification 
in insurance classifications [9]. We expect to refine the classification process through 
symmetry considerations. 
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